Search CORE

29 research outputs found

SEAN: Image Synthesis with Semantic Region-Adaptive Normalization

Author: Abdal Rameen
Qin Yipeng
Wonka Peter
Zhu Peihao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/05/2020
Field of study

We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image. Using SEAN normalization, we can build a network architecture that can control the style of each semantic region individually, e.g., we can specify one style reference image per region. SEAN is better suited to encode, transfer, and synthesize style than the best previous method in terms of reconstruction quality, variability, and visual quality. We evaluate SEAN on multiple datasets and report better quantitative metrics (e.g. FID, PSNR) than the current state of the art. SEAN also pushes the frontier of interactive image editing. We can interactively edit images by changing segmentation masks or the style for any given region. We can also interpolate styles from two reference images per region.Comment: Accepted as a CVPR 2020 oral paper. The interactive demo is available at https://youtu.be/0Vbj9xFgoU

arXiv.org e-Print Archive

Crossref

Labels4Free: Unsupervised Segmentation using StyleGAN

Author: Abdal Rameen
Mitra Niloy J
Wonka Peter
Zhu Peihao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/03/2021
Field of study

We propose an unsupervised segmentation framework for StyleGAN generated objects. We build on two main observations. First, the features generated by StyleGAN hold valuable information that can be utilized towards training segmentation networks. Second, the foreground and background can often be treated to be largely independent and be swapped across images to produce plausible composited images. For our solution, we propose to augment the StyleGAN2 generator architecture with a segmentation branch and to split the generator into a foreground and background network. This enables us to generate soft segmentation masks for the foreground object in an unsupervised fashion. On multiple object classes, we report comparable results against state-of-the-art supervised segmentation networks, while against the best unsupervised segmentation approach we demonstrate a clear improvement, both in qualitative and quantitative metrics. Project Page: https:/rameenabdal.github.io/Labels4Free

arXiv.org e-Print Archive

UCL Discovery

Fatigue Life Simulation and Analysis of Aluminum Alloy Sheet Self-piercing Riveting

Author: Liu Jingna
Ma Wenpeng
Wang Naixin
Zheng Qingchun
Zhu Peihao
Publication venue: Gruppo Italiano Frattura (IGF)
Publication date: 19/07/2020
Field of study

The fatigue life prediction model of self-piecing riveting components of aluminum alloy is established and the effects of roughness and residual stress on fatigue life of self-piercing riveting components is analyzed by the model. Finite element software ABAQUS and fatigue analysis software FE-SAFE are used to study the effects of roughness and residual stress on the fatigue life of self-piecing riveting components through finite element simulation and mathematical statistics multivariate orthogonal regression experiment. The quantitative relations between fatigue life and three variables (roughness, residual stress and maximum stress) are fitted, and the variation trend of fatigue life with roughness and residual stress is obtained. The order of influence of roughness, residual stress, maximum stress and two interactions on fatigue life is as follows: residual stress, interaction between roughness and residual stress, roughness. When the maximum stress is fixed, the fatigue life decreases with the increase of roughness with a certain residual stress, and the fatigue life decreases with the increase of roughness with a certain residual stress. The average error between the fatigue experiment results and the simulation results is 9.74%, which proves that the simulation results are reliable

Italian Group Fracture (IGF): E-Journals / Gruppo Italiano Frattura

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars

Author: Abdal Rameen
Chai Menglei
Lee Hsin-Ying
Siarohin Aliaksandr
Tulyakov Sergey
Wonka Peter
Zhu Peihao
Publication venue
Publication date: 26/03/2023
Field of study

Modern 3D-GANs synthesize geometry and texture by training on large-scale datasets with a consistent structure. Training such models on stylized, artistic data, with often unknown, highly variable geometry, and camera information has not yet been shown possible. Can we train a 3D GAN on such artistic data, while maintaining multi-view consistency and texture quality? To this end, we propose an adaptation framework, where the source domain is a pre-trained 3D-GAN, while the target domain is a 2D-GAN trained on artistic datasets. We then distill the knowledge from a 2D generator to the source 3D generator. To do that, we first propose an optimization-based method to align the distributions of camera parameters across domains. Second, we propose regularizations necessary to learn high-quality texture, while avoiding degenerate geometric solutions, such as flat shapes. Third, we show a deformation-based technique for modeling exaggerated geometry of artistic domains, enabling -- as a byproduct -- personalized geometric editing. Finally, we propose a novel inversion method for 3D-GANs linking the latent spaces of the source and the target domains. Our contributions -- for the first time -- allow for the generation, editing, and animation of personalized artistic 3D avatars on artistic datasets.Comment: Project Page: https://rameenabdal.github.io/3DAvatarGAN

arXiv.org e-Print Archive

BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation

Author: Cheng Shanbo
Huang Degen
Huang Luyang
Kang Liyan
Peng Ningxin
Su Jinsong
Sun Zewei
Wang Mingxuan
Zhu Peihao
Publication venue
Publication date: 09/06/2023
Field of study

We present a large-scale video subtitle translation dataset, BigVideo, to facilitate the study of multi-modality machine translation. Compared with the widely used How2 and VaTeX datasets, BigVideo is more than 10 times larger, consisting of 4.5 million sentence pairs and 9,981 hours of videos. We also introduce two deliberately designed test sets to verify the necessity of visual information: Ambiguous with the presence of ambiguous words, and Unambiguous in which the text context is self-contained for translation. To better model the common semantics shared across texts and videos, we introduce a contrastive learning method in the cross-modal encoder. Extensive experiments on the BigVideo show that: a) Visual information consistently improves the NMT model in terms of BLEU, BLEURT, and COMET on both Ambiguous and Unambiguous test sets. b) Visual information helps disambiguation, compared to the strong text baseline on terminology-targeted scores and human evaluation. Dataset and our implementations are available at https://github.com/DeepLearnXMU/BigVideo-VMT.Comment: Accepted to ACL 2023 Finding

arXiv.org e-Print Archive